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Abstract — We consider the coded cooperative data exchange 
problein for general graphs. In this problem, given a graph 
G = {V,E) representing clients in a broadcast networlt, each of 
which initially hold a (not necessarily disjoint) set of information 
packets; one wishes to design a communication scheme in which 
eventually all clients will hold all the packets of the network. 
Communication is performed in rounds, where in each round a 
single client broadcasts a single (possibly encoded) information 
packet to its neighbors in G. The objective is to design a broadcast 
scheme that satisfies all clients with the minimum number of 
broadcast rounds. 

The coded cooperative data exchange problem has seen signif- 
icant research over the last few years; mostly when the graph G 
is the complete broadcast graph in which each client is adjacent 
to all other clients in the network, but also on general topologies, 
both in the fractional and integral setting. In this work we focus 
on the integral setting in general undirected topologies G. We 
tie the data exchange problem on G to certain well studied 
combinatorial properties of G and in such show that solving the 
problem exactly or even approximately within a multiplicative 
factor of log|V| is intractable (i.e., NP-Hard). We then turn 
to study efficient data exchange schemes yielding a number of 
communication rounds comparable to our intractability result. 
Our communication schemes do not involve encoding, and in such 
yield bounds on the coding advantage in the setting at hand. 

I. Introduction 

In this work we study the coded cooperative data exchange 
problem for general graphs. An instance to the problem 
consists of an undirected graph G = {V,E) representing a 
communication network (in which each node of G represents 
a client, and edges in G represent client pairs that can 
communicate with each other), a parameter k representing 
the number of information packets X = {x-[, . . . , Xj^} to be 
transmitted over the network, and a set of subsets of 

X representing the set of packets available at each client node 
Vj G y in the initial stage of the transition. The objective 
is to design a communication scheme in which, eventually, 
all nodes of the network will obtain all k packets. Loosely 
speaking, in each round of the communication scheme, a single 
node broadcasts a single (possibly encoded) packet to all its 
neighbors in G. The goal is to find a communication scheme 
in which the number of communication rounds is minimum. 

The coded cooperative data exchange problem has seen 
significant research over the last few years. The problem was 
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introduced by El Rouayheb et al. in ifTSl . where data exchange 
over a complete graph G was considered (in which each 
client can broadcast its messages to all other clients in G). In 
ifTsl certain upper and lower bounds on the optimal number 
of transmissions needed was established. In a subsequent 
work, Sprinston et al. ifTTll continue the study of complete 
graphs G and present a (randomized) algorithm that with high 
probability achieves the minimum number of transmissions, 
given that the packets are elements in a field Fq with q large 
enough. Ozgul et al. llT3l study a variant of the data exchange 
problem in which each client has a distinct broadcast cost and 
one wishes to minimize the cost of the transmission scheme 
after which all clients have obtained all information packets. In 
ifTSl , optimal randomized linear encoding schemes are given 
for the problem at hand. 

Communication in which fractional packets can be trans- 
mitted is addressed in the works of Courtade et al. in Q 
(for general topologies G) and Tajbakhsh et al. ifTSll . |fT9l 
(for the complete topology). In the fractional setting, packets 
are assumed to be divisible into chunks so that a fraction 
of a packet may be transmitted at any (fractional) round of 
communication; as apposed to the integral setting in which 
information packets are indivisible. In [S), ifTSll . |fT9l it is 
shown that the fractional setting of the data exchange problem 
reduces to that of multicast network coding and can be 
efficiently solved in an optimal manner via linear programming 
and the concept of linear network coding, see e.g. HI, IS), Q, 

la, cni. 

Most related to our work is the work of Courtade et al. in lH 
which focus on general topologies G in the setting of indivis- 
ible packets (the integral setting). H continue the paradigm 
of lO which characterizes the data exchange problem as a 
family of cut inequalities, and present certain communication 
schemes that yield approximate solutions for an asymptotic 
number of packets k. Roughly speaking, |f4| analyze a certain 
communication scheme in which each client transmits at a 
certain fixed rate over time, and obtain nearly optimal rate 
allocations (within an additive approximation of ek for general 
graphs, and | y | for regular graphs). An important aspect in the 
analysis in l?) is the assumption that the number of packets 
k tends to infinity. A detailed comparison of the results of lH 
with ours appears below at the end of Section iKAl 

Most recently, Milosavljevic et al. llT2l present a compre- 
hensive study of data exchange over the complete topology in 



which one wishes to broadcast the components of a (jointly 
distributed) discrete memoryless muhiple source. Efficient 
optimal rate schemes are presented for a number of side 
information models. 

A. Our contribution 

In this work we study the coded cooperative data exchange 
problem on general topologies. We focus on the combina- 
torial integral setting in which one assumes that packets 
are indivisible. Namely, we assume that each packet is a 
value from a given alphabet JL, and in each communication 
round a single element of Z is broadcasted by a client to its 
neighbors in G. The study of the indivisible integral setting, 
rises naturally in communication schemes in which dividing 
information packets to several chunks leads to undesirable 
overhead in communication (via scheduling issues or rate loss 
due to header information). Our work addresses the design and 
analysis of efficient algorithms that (approximately) solve the 
problem at hand. Throughout our work, we assume that the 
number of packets k is polynomial in the size of the network 
y|. In this context, an efficient algorithm is one which is 
polynomial in the network size. 

We start by tying the data exchange problem in general 
topologies G to certain well studied combinatorial properties 
of G. Specifically, we consider the Dominating Set problem 
(e.g., JSl) and its variants (to be defined in Section |ll]i, and 
show that they are closely related to the data exchange prob- 
lem. Namely, we show that (i) a solution to the Dominating 
Set problem (or its variants) yields a (not necessarily optimal) 
solution to the data exchange problem, and (ii) an optimal 
solution to the data exchange problem yields a nearly optimal 
solution to the Dominating Set problem(s). Roughly speaking, 
these connections (together with others) imply two initial 
results. Primarily, that it is NP-Hard to find a solution to the 
data exchange problem in which the number of communication 
rounds is within a multiplicative factor of n(log \ V\) from the 
optimal. Secondly, that a conceptually simple data exchange 
algorithm, that does not involve encoding, based on the 
Dominating Set problem yields a number of communication 
rounds which is within a multiplicative factor of 0(A: ■ log \ V\) 
from the optimal. 

The gap between the upper and lower bounds above is k 
(the number of distinct packets in the network) which may 
be of significant size. Reducing this gap is the main focus 
of our work. Roughly speaking, in this work we reduce the 
gap of k by analyzing our algorithm based on the Dominating 
Set problem(s). Our algorithm does not involve coding and 
in such yields bounds on the coding advantage in the setting 
of data exchange. Our detailed results are given below, which 
at times are the best possible (assuming standard tractability 
assumptions). 

The paper is structured as follows. In Section|lI] we present 
the model and notation used throughout this work, including 
the several variants of the Dominating Set problem used 
in our analysis. In Section |III1 we prove that it is NP- 
Hard to approximate the data exchange problem on general 



topologies within a multiplicative factor of n(logn) (for any 
k polynomial in n). Here, n = \V\. 

In Section IIVI we present our algorithm for data exchange 
based on the Dominating Set problem and its variants. The 
algorithm we present is conceptually very simple and does 
not involve coding. As mentioned above, a naive analysis of 
our algorithm yields an approximation ratio of 0{k ■ logn), 
and the majority of this section is devoted to proving that the 
algorithm actually performs better 

In Section IIV-BI we show that our algorithm is the best 
possible (assuming standard tractability) and has an approx- 
imation ratio of O(logM) (matching the lower bound of 
Section Ullb on instances in which each packet is initially 
present at a single client in G. This implies a coding advantage 
of O(logn) in such cases. 

In Section ITV-Cl we study data exchange instances in which 
the underlying graph is regular (each client has the same 
number of neighbors). We show that the approximation ratio in 
this case is again better than 0{k ■ log | V | ) and depends on the 
average number d of packets available at client nodes. Specif- 
ically we show that in this case the approximation ratio of 
our algorithm is O (^-^z^^ logn = O ^1 + logn (thus 

improving the factor of k in the naive analysis to 1 + -j^)- 
Notice, that for d = &{k) (the case in which on average each 
client initially has a constant fraction of the packets) we obtain 
an approximation ratio that matches the bound of Section Hill 
Our results imply a coding advantage of O lo§ ^ 

in the cases at hand. Finally, in Section IIV-DI we study 
general graphs G with no restrictions and present an improved 
approximation ratio to that naively mentioned above. 

We conclude our work by studying a refined version of 
our algorithm (still without encoding) in Section HV-EI and by 
discussing future research directions in Section |V] 

Comparing our results with those in ID is not straightfor- 
ward. Courtade et al. iHJ focus on the setting in which the 
number of packets k tends to infinity and may be significantly 
greater than the network size n. The setting of asymptotic k 
allows the design of algorithms which are efficient with respect 
to k but may be exponential in n. In our work we focus on the 
setting in which k is polynomially bounded by n, and obtain 
communication schemes that can be designed efficiently in 
time polynomial in the network size n. In addition, 14] focus 
on the case in which every client initially holds a constant 
fraction of the k information packetsfl and in this setting study 
additive approximations. In this work, we study multiplicative 
approximations, and our assumptions (if any) on the packet 
distribution are of different nature. 

II. Model Definition and Preliminaries 

A. Coded Cooperative Data Exchange Problem 

We start by defining the Coded Cooperative Data Exchange 
Problem for General Graphs. Let G = {V,E) be a given 

' The precise formulation in (3 is phi'ased in terms of "well behaved" packet 
distributions; i.e., the asymptotic (in k) empirical probability that a client (or 
set of clients) holds a certain number of packets. 



undirected graph with V = {cj,..., c,,}. Let X = {xj,..., Xj^} 
be a set of packets to be deUvered to the n cUents belonging to 
the set V. The packets are elements of a finite alphabet which 
will be assumed to be a finite field Fq. At the beginning, each 
cUent Vi knows a subset of packets denoted by X, C X, while 
the clients collectively know all packets in X. We denote by 
X, = X \ X, the set of packets required by client u,. For 
each client (vertex in G) Vi let d-o^ = |X, | be the number of 
packets it holds, let d = YLmEV'^v/n be the average number 
of packets present at vertices of G, and let d = maXpgyd,, 
be the maximum number if packets that any client holds. We 
will use these parameters in our analysis. 

Each client may transmit packets to it neighbors in G via 
a lossless broadcast channel capable of transmitting a single 
element in F^. The data is transmitted in communication 
rounds, such that at round / one of the clients, say v, broadcasts 
an element x G to all its neighbors in G. The transmitted 
information x may be one of the original packets in Xj, or 
some encoding of packets in Xj and the information previously 
transmitted to Vj. 

Our goal is to devise a scheme that enables each client 
Vj G y to obtain all packets in X, (and thus in X) while 
minimizing the total number of broadcasts. This work focuses 
on the integral (i.e., scalar) setting in which each broadcast 
consists of a single element of F,j. We denote by NC the 
minimum number of (integral) broadcasts needed to satisfy the 
given instance to the coded cooperative data exchange problem 
at hand. In this work we connect the value of NC with other 
well studies combinatorial operators on G defied below. 

Throughout our work, we assume that the number of packets 
k is polynomial in the size of the network |y| (i.e., k < \V\'^ 
for some constant c). In this context, we say an algorithm is 
efficient if its running time is polynomial in the network size. 

B. The Self Dominating Set problem 

Given an undirected graph G = {V,E), a self dominating 
set of G is a subset of vertices S such that every f G V 
is connected to some vertex s G S by an edge {s,v) G E. 
In such a case we say that v G N(s) where N(s) = {v \ 
{s,v) G E}. The self dominating set problem is closely related 
to the standard dominating set problem, e.g. fH, on which we 
elaborate below. The minimum size of a self dominating set 
in G is denoted by A self dominating set S with a 

corresponding induced subgraph that is connected is referred 
to as a connected self dominating set. Denote by CDS+ the 
size of a minimum connected self dominating set in G. We 
will show below that computing (or approximating) any of the 
values mentioned above (i.e., DS"*", CDS"*") is NP-Hard. 

In this work we will also be interested in a fractional 
version of the Self Dominating Set problems expressed by 
the following linear program. Given a graph G = [V ,E), find 
a set of capacities C = {ci,\v G V} (where for each v ^V, Cv 
is the capacity of vertex v) such that Yjv^V '^v is minimum, and 
\fv G V it holds that Cu > 1. The above is equivalent 

to the solution of the following LP; 



Minimize DogyC^, 
subject to EueN(v) > 1, e V 
0<c^<l,yveV 

Let DSj denote the minimum value of the linear program 
above. By considering integral values of Cp, it is straightfor- 
ward to estabUsh that DS^ < DS+. 

As we will see, at times we would like to "cover" each 
vertex in G more than once by our self dominating sets S. 
We thus consider the integer and fractional k Self Dominating 
Set problems as well. Below we phrase the fractional version, 
with optimum denoted by {k — DS^)f, the integer variant 
is obtained by setting Cj, G {0,1} and its optimum will be 
denoted by fc — DS+: 

Minimize EogyCi, 
subject to EueN{v) >k,\/veV 
< Ci, < 1, Vu G y 

Finally, as we will see, to connect the cooperative data 
exchange problem with the notion of dominating sets in G, 
we will need to specify the "cover" requirement explicitly for 
each vertex v. We refer to this variant as the Augmented-k- 
Fractional Self Dominating Set problem. Here, we solve the 
same linear program with the exception that each vertex needs 
to be covered at least k — dy times (the use of the parameter dy 
that was defined previously to be the number of initial packets 
present at v is not occasional). 

Minimize YLveV 

subject to Ekg N{v) Cu>k- di„ Vz7 G y 
< Ci, < 1, Vi^ G V 

We denote hy A — {k — DS^)f the optimal solution to the 
linear program above. Note that the above is an augmented 
version of the k fractional self dominating set problem when 
there is an initial solution {dj,} and we wish to augment it to 
a full solution by using values of {ci,}. 

Some observations and related work expressing the relation- 
ships between the notions defined above are in place: 

Lemma 1 {k- DS+)f = k ■ DS+. 

Proof: Any solution {cy} to DSj^ implies a solution {c*j} = 
{k ■ Cy} to (fc — DS^)jr and visa versa. ■ 

Note that the above lemma is not valid for the integral 
versions of the problems, namely k — DS+ 7^ k ■ DS+. E.g., 
it is not hard to verify that the 2 by 3 complete bipartite graph 
(X23) with an additional edge between the two vertices in the 
2-size side has 2 — DS+ = 3 and = 2. 

Lemma 2 Defining the parameters dy to be equal to \ X, | for 
every vi G V, it holds that A - {k - DS+) f < NC. 

Proof: Consider any solution to the coded cooperative data 
exchange problem. For every vertex c G V, let Cy be the 
number of times v transmitted information during the execu- 
tion of the solution at hand. By our definitions Y,c-o > NC. 
We now show that {cy} is also a solution to A — (A: — DS^)f. 
Namely, consider any v E V that is missing k — dy packets in 
our data exchange problem. It must be the case, that during 



the process of communication it received at least k — dy 
broadcasts, as otherwise it could not be able to obtain all k 
packets after the communication process. Thus it holds that 
EueN{D) Cu>k- dy as desired. ■ 

Lemma 3 Let {d-o} be the set of weights in the augmented k- 
dominating set problem, and let d = maxy^y dy. Then 

(k-d)- DS^ <A-{k- DS+)f < NC. 

Proof: The right inequality follows from Lemma [2] For the 
left inequality, we notice that each solution to the fractional 
augmented k self dominating set problem is a fractional solu- 
tion to the {k — d) self dominating set problem . Namely, let 
{cy} be the capacities of an optimal solution to the fractional 
augmented k self dominating set problem. Then for all v it 
holds that Cu > k — dy > k — d. Therefore, {cy} is a 

solution to the fractional (fc — d) self dominating set problem. 
Now using Lemma [T] we obtain: 

{k-d)- DS+ = {{k -d)- DS+)f <A-{k- DS+)f. 



C. The (standard) dominating set problem 

We now address the standard dominating set problem, which 
slightly differs from the previously defined self dominating set 
problem. Given an undirected graph G = {V,E), a (standard) 
dominating set of G is a subset of vertices S such that every 
c G V is either in S or connected to some vertex s G S by an 
edge {s,v) G E. The minimum sized dominating set in G is 
denoted by DS. The fractional variant of the dominating set 
problem is expressed by the following linear problem: 

Minimize J^y^v 

subject to JLue'N{v)\j{v} 

Q <cy <\,^v ev 

We denote the optimal solution to the linear problem above 
by DSf Clearly, it holds that DS < DS+ and that DSf < 

DSp 

As before, one can define the connected variant of the 
dominating set problem, and the ^-dominating set problem. 
We denote the optional values in these cases as CDS for the 
connected variant, k — DS for integral ^-dominating set, and 
(k — DS)jr for fractional ^-dominating set. As in Lemma [T] 
we have that: 

Lemma 4 {k-DS)f = k-DSf. 

The following lemma that constructively connects between 
dominating sets and their connected variant was proven in Q. 

Lemma 5 (IH) Given any dominating set D, one can effi- 
ciently construct a connected dominating set D' with \D'\ < 
5 ■ |D|. Specifically, for every connected graph G = iV,E) it 
holds that CDS < ^ ■ DS. 

It is NP-Hard to estimate the size of the minimum dom- 
inating set of a given graph G up to a multiplicative factor 
of n(log (US. Notice that if CDS > 1, then CDS+ = 
CDS, (and in general CDS+ < CDS + 1) so finding CDS, 



and CDS~^ (and also approximating them beyond a ratio of 
n(log|y|)) is also NP-hard. Lemma |5] and the definition of 
the self dominating set problem imply the following lemma 
which connects DS, DS+, CDS, and CDS+: 

Lemma 6 

^DS + 1 > CDS + 1 > CDS+ > DS+ > DS. 

Lemma |6] implies that all the values DS, DS+, CDS, and 
CDS+ are all all approximately (up to constant factors) the 
same size. 

in. Intractability Results 

In this section we show that the coded cooperative data 
exchange problem is hard to approximate within a multiplica- 
tive factor of clog|V|, for some c > 0, for every value of 
k. We use the fact that it is NP-hard to estimate DS within 
a multiplicative factor of c log | V | , for some c > lfT4l . We 
first show our hardness for k = 1. We then turn to the case of 
general k (polynomial in n). 

Lemma 7 The coded cooperative data exchange problem with 
k = 1 is NP-hard to approximate within clog for some 
c> 0. 

Proof: We show that, essentially, the coded cooperative data 
exchange problem when fc = 1 is equivalent to the connected 
dominating set problem. Namely, consider any (connected) 
instance G = {V,E) of the dominating set problem and 
construct an instance to the data exchange problem which 
includes the network G and a single node Vq G V that holds 
the (single) message Xi. We show that the number of rounds 
in the optimal solution to the data exchange instance at hand 
NC is approximately the size of the minimum connected 
dominating set size CDS of G. Specifically 

CDS < NC < CDS + 1 

Consider an optimal solution to the data exchange problem. 
Notice that, as each edge has unit capacity, once there is 
only a single message Xi to be broadcasted throughout the 
network, no encoding is needed. Thus, any solution to the data 
exchange problem will correspond to a series of broadcasts of 
message Xi at certain nodes of the network. As there is only a 
single message, it also holds that no vertex needs to broadcast 
more than once. Let S be the set of vertices that performed 
a broadcast. The size of S is exactly the value of NC on 
the instance at hand. In addition, as every vertex c G V has 
received Xi, it holds that either c G S or ?7 is connected to S. 
This implies that S is a connected dominating set in G. 

For the opposite direction, notice that any connected dom- 
inating set S in G implied a broadcast scheme for the data 
exchange problem. If Vq is in S, then consider a broadcasting 
scheme that transmits according to a Breadth First Search 
(BSF) starting from vq in the subgraph induced by S. It is 
not hard to verify that such a scheme will use | S | broadcasts 
and eventually will transmit Xi to all the network. Namely, 
let {vq, i^i, C2 ■ • ■ ) be a BSF ordering from Vq on the vertices 
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Figure 1. Illustration of the graph G' of Lemma[8] 



of S. The message G X can be transmitted from Vq to all 
nodes in V using the ordering {vq, Vi,V2. ■ ■)■ Specifically, our 
ordering implies that node Vj holds the message Xi after nodes 
{vq,Vi, . . .,Vj_i] transmit and, as S is dominating, all nodes 
will eventually receive the message x^. If Vq is not is S then it 
is connected to s G S, so we can add Vq to S and still have a 
connected dominating set (and now use the scheme described 
in the last paragraph). All in all, the resulting scheme will 
have CDS + 1 broadcasts. 

As it is NP-hard to approximate CDS within a multiplicative 
factor of c log n for some universal constant c > on graphs 
of size n for which CDS depends on n (this follows directly 
from lfT4l and Lemma |6l); it holds that the same is true for the 
parameter NC under study. ■ 

Note that the proof of Lemma |7] is also valid for = 1 in 
the specific case when only one vertex holds the information. 
This implies that our upper bound for the case of disjoint sets 
of messages discussed in Sectior HV-Bl is tight. 

We now show that our hardness result holds for every k 
by (again) presenting a reduction from the dominating set 
problem. Given an instance G = {V,E) to the dominating 
set problem, we construct the following graph G' = {V ,E') 
for the coded cooperative data exchange problem. G' has k 
copies of G, and a new vertex v, such that v is connected 
to a vertex m, in each copy G, of G. Figure [T] illustrates G'. 
All vertices m, know all messages, v knows no message, and 
for each G, all vertices in G, besides w, know all messages 
besides the fth one. 

Lemma 8 lfDS{G) < a then JVC(G') < • f ■ (a + 1), and 
ifDS{G) > ^ then NC(G') > A: ■ /3. 

Proof: Assume that DS(G) < a. Then the following is 
a transmission algorithm in fc ■ j ■ (a + 1) communication 
rounds. Let Uj be a minimum connected dominating set of G,. 
For all 1 < f < A:, as only one message needs to be broadcasted 
throughout G;, one may design a broadcast scheme to satisfy 
all nodes in G/ based on the connected dominating set U, 
exactly as in the proof of Lemma [T] It holds (via Lemma |6]l 



NC(GO < X:(|Li,| + l)<X:vDS(G,) + l 

(=1 i=l ^ 

= k-[^-DS{G) + l^ <fc-^-(a + l). 

Now assume that NC(G') < fc ■ j6. We first show that 
CDS{Gi) < NC{Gi). This follows by the proof of Lemma E] 
as any communication scheme in G/ only needs to com- 
municate a single message Xj from Uj to the vertices of 
G, (recall that all vertices in G, know all the messages in 
X \ {xi}). Now, it also holds that DS(G,) < CDS{Gi) 
and that NC(G,) < NC{G')/k, thus we obtain DS(G) = 
DS(G,) < NC{Gi) < ^. All in all, we now conclude that 
estimating NC(G') within a multiplicative factor of O(logn) 
will imply such an estimate for DS(G). ■ 

Lemma |7] Lemma [8] and the hardness of computing DS 
specified in llT4ll imply the following theorem: 

Theorem 1 The coded cooperative data exchange problem is 
NP-hard to approximate within clog \ V\, for some c > 0, for 
every value of k polynomial in \ V\. 

IV. Approximation Algorithm 

In this section we give an approximation algorithm for the 
coded data exchange problem and analyze its approximation 
ratio. In the first subsection we present the approximation 
algorithm. In the second subsection we analyze the quality 
of the algorithm on a number of graph families or initial 
packet allocations, and show that for these instances the 
approximation ratio of the given algorithm matches (or comes 
close to matching) the results given in the previous section. 
In the third subsection we extend our analysis to the general 
case. 

A. The Algorithm 

The following lemma introduces an approximation algo- 
rithm for the cooperative data exchange problem. 

Lemma 9 Given a connected dominating set D of G one 
can efficiently solve the cooperative data exchange problem 
in k ■ {\D\ + 1) communication rounds. Specifically, NC < 
k-{CDS + l). 

Proof: The proof follows that given in Lemma [T] Let D 
be a connected dominating set in G. Let s, be an arbitrary 
node holding message Xj. Assume that s, is a node in D. 
Let (s,, Vi,V2- ■ ■) be a BSF (Breadth First Search) ordering 
from s, on the vertices of D. The message Xj E X can 
be transmitted from s, to all nodes in V using the ordering 
{si,Vi,V2 . . . ). Specifically, our ordering implies that node up- 
holds the message x,- after nodes {si,vi, . . . ,Vj_i} transmit 
and, as D is dominating, all nodes will eventually receive the 
message x,. All in all, transmission of the k messages will 
take k ■ CDS communication rounds. If s, is not in D, then 
an additional round of communication is required for each 
message in order for it to reach the set D. ■ 



Since the problem of finding a minimum connected domi- 
nating set is NP-hard, we need to show how to approximately 
find such a set (efficiently). Roughly speaking, we will find 
a connected dominating set in our network G by first solving 
the fractional dominating set problem, by then rounding the 
fractional solution to an integral one to obtain a standard 
dominating set of G (see e.g.. El, El, HH, US), and by 
finally modifying the dominating set to a connected one via 
Lemma |5] All in all, this (well studied) scheme will yield a 
connected dominating set D of size at most clogn ■ DSjr for 
some universal constant c > 0. 

Repeating the above more formally, given an instance G to 
the cooperative data exchange problem on general topologies, 
one can efficiently perform the following algorithm: 

1) Solve the fractional dominating set problem on G to 
obtain a fractional solution {c^}. 

2) Change the fractional solution to an integral one {Cj,} 
corresponding to a dominating set D (via, e.g., ||2l, |[8l. 

El, Ml 

3) Using D, construct a connected dominating set D' 
(Lemma© with |D'| = 0{\D\). 

4) Broadcast the k source messages according to the pro- 
cedure specified in Lemma|9]in 0{k\D'\) < 0{klogn ■ 
DSf) communication rounds. 

The procedure above will yield a communication scheme 
with at most 0{k\ogn ■ DSf) communication rounds. To 
understand the quality of the algorithm, one must express 
the size NC (or at least bound it from below) by an 
expression which can be easily compared with the bound 
0(/clogn ■ DSf). For example, consider an instance to the 
data exchange problem in which d = maxpgydi, < k (here, 
for all Vj G V d-o- = |Xj|). We have seen via Lemma |3] that 
NC>{k-d)- DSj > (k-d) - DSf > DSf. Thus, on these 
instances we obtain a solution to the data exchange problem 
that is within a multiplicative factor of O(fclogn) from the 
optimal solution. It is also not hard to see (we do this implicitly 
in Section BV-Eb that even if d = maXpgyrfp = k a slight 
variant to our algorithm yields a solution which is within a 
multiplicative factor of 0(A:logn) from the optimal solution. 
The next sections attempt to improve this ratio to better match 
the hardness results presented in Section Hill Specifically, we 
show that the factor of k in the ratio 0{k log n) can be reduced 
or in cases removed. 

B. Disjoint Sets of Messages 

In this subsection we analyze our approximation algorithm 
for the case that for each two nodes v, u it holds that D 
Xu = 0. Note that this includes the case where only one 
node holds all the information, and all other nodes have no 
information. Namely, for some v G V, X-a = X, and for all 
II ^ V Xu = 0. For this case we are able to improve over the 
lower bound presented in Lemma |3] 

Lemma 10 NC>k-DSf. 

Proof: We show that a solution to the Coded Cooperative Data 
Exchange problem induces a solution to the k Dominating Set 



problem. For every vertex v ^ V define c-o to be the number 
of packets transmitted by v during an optimal data exchange 
protocol. It holds that for every vertex v the sum of capacities 
c„ of all u G N{v) U {v} is at least k. This is true since each 
node V must send at least |Xi, | packets (as no other node holds 
the packets in Xp and they must eventually reach the entire 
network), and receive at least fc — | Xi, | = | Xd | packets. There- 
for 'Lu£N{v)[j{v}'^u = Ei/eN(i;) '^u + > |Xi,| + |Xj;| = k. 

Thus {k - DS)f < k - DS < NC. Finally, by Lemma Hit 
holds that k ■ DSf = {k-DS)f. m 

As our algorithm gives a communication scheme with at 
most 0(A:logn ■ DSf) rounds we conclude: 

Theorem 2 If for every two nodes v, u it holds that X-o H 
X„ = 0, the cooperative data exchange problem on general 
topologies can be efficiently solved within an approximation 
ratio of O(logn). Moreover, in such cases it holds that 

k-DSf < NC < fc - Q • + < 0{klogn)DSf. 

As our algorithm does not involve coding, this implies that the 
coding advantage is 0(log n). 

C. Regular Graphs 

In this subsection we show that if the given graph is regular 
our approximation algorithm has a (1 + d/(fc — d)) ■ O(logn) 
approximation ratio. As before, we start by giving a lower 
bound for NC in this case. Let G be a A regular graph, and 
let d = i J^dy, then it holds that 

Lemma 11 {k-d)DSf < NC. 

Proof: Consider the optimal communication scheme for the 
data exchange problem. Since every vertex v must receive at 
least k — dy messages, the total number of edge transmissions 
over the network is at least Y^veV ^ ~ '^v (here we are counting 
a single broadcast over r edges as r "edge"-transmissions). 
Since each broadcast may transmit over at most A vertices it 
follows that 

> E.ei/^ - dv ^ n{k- d) ^ _ ^-^^^^ 

For the second inequality, notice that one can obtain a 
fractional self dominating set by setting Cy = for each 
V eV. This implies that DS^ < -j. The last inequality holds 
by definition of DS^ and DSj. m 

The following theorem follows from the above lemma: 

Theorem 3 The cooperative data exchange problem on regular 
topologies has a {1 + d / [k — d)) ■ O(logn) approximation 
ratio. Specifically, 

{k-d)-DSf<NC<k- + < 0{klogn)DSf. 

As our algorithm does not involve coding, this implies that the 
coding advantage isO (^(^1 + ) • 

Proof: By Lemma [TT] we have that 

(fc - d) ■ DSf < NC. 




m 



Figure 2. Illustration of the graph G for our counter example to Lemma [TT] 
on general (non-regular) graphs. 

All in all, we obtain a solution with cost 

0{logn) - k-DSf = O(logM) ■ — (fc-d)DSf 
■' k — a ■' 

< 0(logn)-JVC-^- 

= O(logn) -NC- (l+d7(fc-rf)). 

■ 

D. General Case 

In this subsection we analyze the quality of our approxima- 
tion algorithm for any instance G. We first give an an example 
that shows that our lower bound for regular graphs stated in 
Lemma nn of {k — d) ■ DSf does not hold for general graphs. 

1) Example, complementing Lemma^^ We present a (gen- 
eral, non-regular) graph G for which the lower bound stated 
in Lemma [TT] of [k — d) ■ DSf does not hold (even in an 
approximate manner). Consider a graph G that consists of 
two parts: The first part is a set of m (disjoint) cliques of 
size k. In each clique, for each message i (between 1 to k) 
there is exactly 1 vertex missing message / and having all 
the rest. The second part consists of a clique of size mk in 
which one vertex has all the information, and all the rest do 
not have any message. Figure |2] illustrates G. The value of 
an optimal scheme for data broadcast on the first part of G 
is 2m since for each clique two messages must be sent (One 
client broadcasts an arbitrary message. This will cause another 
client to have all of X, and it broadcasts the sum of all the 
messages in X over Fq). The value of an optimal scheme for 
data broadcast on the second part is obviously k (just perform 
k broadcasts from the single node that has all of X). Thus NC 
is 2m + k. Now, it is not hard to verify that DSf of the first 
part is m (1 for each clique), and DSf of the second part is 

1. Therefore DSf is m + 1. Moreover, d = = |, 

so [k — d)DSf = |(ffi + 1). Therefore, for large m ^ k v/e 



get that NC ---^ 2m, and {k — d)DSf ~ so we get a gap 
of approximately k/A w.r.t. the assertion of Lemma [TT] 

2) Generalizing Lemma [TT} We use A to denote the maxi- 
mum degree of G and 5 to denote the minimum degree of G. 
We generalize Lemma [TT] to the case of general graphs: 

Lemma 12 - d)DSf < NC. 

Proof: As in the proof of Lemma [TT] the total number of 
edge transmissions over the network is at least I]j,£y/c — dy. 
Since each message can be transmitted to at most A vertices 
it follows that 

> Lveyk-dy ^ n{k-d) 

~ A A 

5 3 

> -{k-d)DS+ >-{k-d)DSf. (1) 

In the setting at hand, the second inequality is valid since j 
is an upper bound for DS^ (i.e., one may set every Cj, to be 

equal to | to get a valid solution to the linear program defining 
DS+). . 

We now conclude (recall that d = max^gy dp): 

Theorem 4 The cooperative data exchange problem on gen- 
eral topologies has an approximation ration and coding advan- 
tage of 

Proof: Using Lemma [3] we have that 

(fc - d) ■ DSf <{k-d)- DSj < NC. 
In addition, by Lemma [12] we have that 

-^{k - d) ■ DSf < NC, (2) 
Thus, the cost O(logn) ■ k ■ DSf of our solution is at most: 

0{logn)-NC-mm{{l + d/{k-d)),j-{l + d/{k-d))}. 
■ 

E. A Tighter Upper Bound 

We now present a refined version of our algorithm from 
Section IIV-AI The algorithm we present will not yield im- 
proved asymptotic (in n) approximation ratios, however it 
yields improved communication schemes that at times may 
match those returned by the algorithm of Section IIV-AI and at 
times may be significantly better (depending on the instance 
at hand). 

Roughly speaking, we improve the previous algorithm by 
taking into account the simple fact that it suffices to send 
each packet X; G X only to those clients that do not hold 
it. Therefore we do not actually need to find a connected 
dominating set. Instead, we can do the following. Let V/ be the 
set of vertices holding information packet X/. Let Vj = V \ Vj. 
A minimum sized V;-self dominating set is a minimum sized 
set of vertices S C V such that each vertex in Vj has a neighbor 



Figures. Illustration of Gj to be used in the construction of DS;. 

in S, and each connected component of S intersects V^. Using 
t^-self dominating sets we can refine our algorithm. 

Assume first that we know how to find a minimum sized 
Vj-self dominating set for each message x,-. Let {Cj}|^^ 
be the set of connected components of such a minimum 
sized t^-self dominating set. Let w-^ be an arbitrary vertex 
in n V^-. To communicate x, throughout Cj we use the 
following natural procedure: sends x,, and then each vertex 
in Cy that received x, sends x,. It immediately follows, after 
performing this process for each connected component C^-, 
that all vertices in G hold x,-. Moreover, the number of 
communication rounds used in this scheme is equal to the 
size of the y,-self dominating set. Let DS, = DSi(G) denote 
the minimum size of a t7,-self dominating set in G. 

We now turn to approximating DS;. We define the following 
graph G\ = {¥■,£'■) corresponding to our definition of a it- 
self dominating set: Vl = Vi, and E[ = £ U (V; x V^). Figure [3] 
illustrates the construction of G'^. 

Lemma 13 CDS{G'-) < DS,(G) < CDS(G;) + 1. 

Proof: Let D be a minimum sized t/,-self dominating set in 
G. Then by definition of DS, it follows that D is a connected 
dominating set in G'-, since every connected component of D 
in G has a vertex in V;, and all vertices in Vj are connected in 
G'l- Similarly, let D be a minimum connected dominating set 
in Gj'. If D includes a vertex in then it follows that D is 
also a 17;-self dominating set in G. Otherwise (as we assume 
w.l.o.g. that G is connected) we add one vertex v in V, to 
D. Here, we take v to be any vertex in Vj, as they are all 
connected to D. This completes the proof. ■ 

By Lemma [13] we can efficiently perform the following 
algorithm: 

1) For 1 < f < A: do: 

a) Construct G'-. 

b) Using the algorithm specified in Section |IV] con- 
struct a connected dominating set D,- in G'- via 
the corresponding fractional solutions, with | D, | = 
0{los\Vl\-DSf{G'^)). 



c) For each connected component of D, in G broad- 
cast X, (according to the procedure specified in 
the discussion above) in 0(|D,|) communication 
rounds. 

All in all, the refined algorithm efficiently solves the data 
exchange problem in 0(X],- log • DS|(G[)) rounds of 
communication which is at most the number 0{klogn ■ 
DSjr{G)) of rounds from the original algorithm. This follows 
since G is subgraph (in edges) of G^', and thus by definition 
DSf{Gi) < DSf{G). Thus, our refined algorithm is at least 
as good as that of Section IIV-AI and improves over it in cases 
in which DS, is significantly smaller than the dominating set 
in G. 

V. Concluding remarks 

In this paper, we consider the cooperative data exchange 
problem for general topologies G in the combinatorial in- 
tegral setting. We establish both upper and lower bounds 
on the multiplicative approximation ratio that one may ob- 
tain efficiently by tying our problem to certain well studied 
combinatorial properties of G. Our achievability results are 
based on communication schemes that do not involve coding 
and in such imply bounds on the coding advantage of the 
problem at hand. Our results address the setting of undirected 
networks. Extending our results to the case of directed graphs 
(by studying directed analogs to dominating sets) involves 
modifications in our analysis and is subject to future research. 
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